Critical and Near-Critical Branching Processes 
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Scale-free dynamics in physical and biological systems can arise from a variety of causes. Here, 
we explore a branching process which leads to such dynamics. We find conditions for the appearance 
of power laws and study quantitatively what happens to these power laws when such conditions are 
violated. From a branching process model, we predict the behavior of two systems which seem to 
exhibit near scale-free behavior — rank-frequency distributions of number of subtaxa in biology, and 
abundance distributions of genotypes in an artificial life system. In the light of these, we discuss 
distributions of avalanche sizes in the Bak-Tang-Wiesenfeld sandpile model. 



I. INTRODUCTION 

Scale-free distributions, or power laws, have been ob- 
served in a variety of biological, chemical and physical 
systems. Such distributions can arise from different un- 
derlying mechanisms, but always involve a separation of 
scales, which forces the distribution to take a standard 
form. Scale-free distributions are most often observed in 
the distribution of sizes of events (such as the Gutenberg- 
Richter law Q), the distribution of times between events 
(e.g., the inter-event interval distribution in neuronal 
spike trains [Q]), and frequencies. An example of the 
latter is the well-known and ubiquitous 1// noise. Some 
systems are even more interesting because they seem to 
exhibit self-organization or self-tuning, concomitant with 
scale-free behavior as an inherent and robust property of 
the system, not due to the tuning of a control parameter 
by the experimenter. 

Two systems to which such spontaneous scale-free be- 
havior has been attributed are sandpile models and taxon 
creation in biological systems. The former has served 
as the paradigm of "self-organized criticality" (SOC) ||, 
while the latter, manifested in the form of near power 
law shapes of rank-abundance curves, has been advanced 
as evidence of a fractal geometry of evolution B . 

A much simpler system where power laws are observed 
is the random walk For example, the waiting times 
t for first return to zero of the simple random walk in 
one dimension (starting at x = 0, at each time step, 
x(t + 1) = x(t) ± 1 with equal probability) have a prob- 
ability distribution ~ t~ 3 / 2 . Closely related to ran- 
dom walks, branching processes Q| can also create power 
law distributions. They have been used to model the 
dynamics of many systems in a wide variety of disci- 
plines, including demography, genetics, ecology, physiol- 
ogy, chemistry, nuclear physics, and astrophysics. Here, 
we use a branching process to model the creation and 
growth of evolutionary taxa, and discuss its application 
to avalanches in SOC sandpile models. 

In Section II, we examine the properties of the Galton- 
Watson process. We find that this process can generate 
power laws by appropriate tuning of a control parame- 
ter, and examine the dynamics of the system both at the 



critical point and away from it. In Section III, we ap- 
ply this branching process model to the taxonomic rank- 
frequency abundance patterns of evolution, and discuss 
the universality of their underlying dynamics. Finally, 
in Section IV, we discuss the implications of our work, 
including a discussion of the order and control parame- 
ters for the branching process and its applications, and 
suggest further questions. 



II. THE BRANCHING PROCESS 

The Galton- Watson branching process was first in- 
troduced in 1874 to explain the disappearance of fam- 
ily names among the British peerage [Q. It is the first 
branching process in the literature, and also one of the 
simplest. Consider an organism that replicates. The 
number of replicants (daughters) it spawns is determined 
probabilistically, with pi (i — 0, 1,2...) being the proba- 
bility of having i daughters. Each daughter replicates 
(with the same pi as the original organism) and the 
daughter's daughters replicate and so on. We are inter- 
ested in the rank-frequency probability distribution P(n) 
of the total number of organisms descended from this or- 
ganism plus 1 (for the original organism), i.e., the histor- 
ical size of the "colony" the ancestral replicant has given 
rise to. Note that this is equivalent to asking for the 
probability distribution of the length of a random walk 
starting from 1 and returning to with step sizes given 
by P(An) =p 4 _x (» = 0,1,2...) §]. 

The abundance distribution P(n) can be found by 
defining a generating function 



F(s)=Y / PW- (1) 

i=i 

This function satisfies the relationship 

oo 

F(s)=sJ2Pi[F(s)l\ (2) 

i=0 

from which each P(n) can be determined by equating co- 
efficients of the same order in s || . This result can also 
be written as 
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P(n) = -Q( n ,n- 1) 
n 



(k>l), 



(3) 



where Q(i,j) is defined as the probability that j organ- 
isms will give birth to a total of i true daughters [^). 
However, these approaches are not numerically efficient, 
as the calculation of P(n) for each new value of n requires 
re-calculation of each term in the result. 

For our present purposes, we approach the problem 
in a different manner. Let Puj be the probability that 
given j original organisms, we end up with a total of k 
organisms after all organisms have finished replicating. 
Obviously, 



(k < j), 



(4) 



since it is impossible to have less total organisms than 
one starts out with, and 



^111 =Po, 



(5) 



i.e., the probability for one organism to have no daugh- 
ters. A little less obviously, 
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Pk\i = ^2pjP(k-t)\j, 
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These equations allow us to use dynamic programming 
techniques to calculate P(n) (= P n \i), significantly re- 
ducing the computational time required. Also, from Eq. 
(^J), we can write 



Pn\l 
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P(n-l)|l P(n-l)\l " P{n-V)\\ 

Since, for n — > oo, P n y is uniformly decreasing, we see 
P(n) P„|i 



(8) 



P(rc-l) P (n _ m 



C asmoo, (C < 1) (9) 



where C is a constant. C indicates the asymptotic be- 
havior of P(n) as n — > oo. If C < 1, the probability dis- 
tribution is asymptotically exponential, while for C = 1, 
the probability distribution is a power law with exponent 
-3/2. 

Let us now examine the behavior of P(n) when n < 
10 4 , the more relevant case in the examples to follow. Us- 
ing Eqs. (§)-(0), we can numerically calculate P(n) for 
different sets of pi. We define m as the expected number 
of daughters per organism, given a set of probabilities pf, 



E 



i - Pi 



(10) 



We see that the branching rate m (the control parameter) 
is a good indicator of the shape of the probability curve 



(Fig.0). When m is close to 1, the distribution is nearly a 
power law, and the further m diverges from 1 , the further 
the curve diverges from a power law towards an exponen- 
tial. When to = 1/2, the curve is completely exponential. 
For a population of organisms, to is a measure of the ten- 
dency for new generations to grow, or shrink, in number. 
A value of to > 1 indicates a growing generation size, 
which implies that there will, on average, be no gener- 
ation with no daughters, and that the expected number 
of total organisms is infinite. Conversely, to < 1 indi- 
cates a shrinking population size: There will be a final 
generation with no daughters, and the expected number 
of organisms is finite. When m = 1, the system is in 
between the two regimes (the system is said to be "criti- 
cal" ), and only then is a power law distribution found. In 
general, the branching rate is determined by the ratio of 
the rate of introduction of competitors R c to the intrinsic 
rate of growth of existing assemblages R p via 



- = (i + ^r 1 

Hp 



(11) 



as can be shown by assuming stationarity. As this ratio 
goes to 0, m — * 1 and the system becomes critical. 

In the following section, we explore systems where the 
"organisms" are individual members of species or taxa in 
a taxonomic tree, and to is the average number of exact 
copies an individual makes of itself, or the average num- 
ber of new taxa of the same supertaxon a taxon spawns, 
respectively. The same thinking can be applied to tum- 
bling sites in a sandpile model, where m would stand for 
the average number of new tumbles directly caused by a 
tumbling site. 



1 1 


1 1 

m=0.999 




\m=0.77 " 
i \ 



10" 



10' 



10' 



FIG. 1. Predicted abundance patterns P(n) of the branch- 
ing model with different values of m. The curves have been 
individually re-scaled. 
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III. APPLICATIONS 



A. Neutral Model 




We first present a simple simulation to test our analysis 
and lay the groundwork for the exploration of more com- 
plicated systems. Consider a population of organisms on 
a finite two-dimensional Euclidean lattice, one organism 
to a grid square. Each organism can be viable or ster- 
ile. All viable organisms replicate approximately every r 
time steps (there is a small random component to each 
individual's replication time to avoid synchronization ef- 
fects), while sterile organisms do not replicate. When an 
organism replicates, its daughter replaces the oldest or- 
ganism in the parent's 9-site neighborhood (Fig. ||). We 
define the fidelity F as the probability that the organ- 
ism will create a daughter of the same type as itself and 
the corresponding genomic mutation rate R (= 1 — F) 
at which it creates copies different from itself. The ge- 
nomic mutation rate is actually the sum of two rates, a 
probability R n for the daughter to be viable but to be of 
a new genotype, different from that of the parent [neu- 
trality rate), and a probability R s of the daughter being 
sterile. Of course, R n + R s = R. Note that all viable 
mutant daughters still share the same replication time 
r — all mutations are neutral (See Fig. ||). Such a system 
gives rise to abundance distributions of power law and 
near-power law type, which can be predicted as follows. 



FIG. 2. Neutral model grid. The organisms live on an Eu- 
clidean grid, one organism to a site. When an organism repli- 
cates, its daughter replaces the oldest organism in the 9-site 
neighborhood. (If the organism marked by a black dot repli- 
cates, its daughter replaces one of the organisms at a gray 
site.) 



FIG. 3. Neutral replications and mutations. An organism's 
daughter is of the same genotype as the organism with prob- 
ability F, it is of a new, viable genotype with probability R n , 
and it is sterile with probability R a such that F + R n + R s — 1. 

The total number of organisms is determined by the 
size of the grid. We write equilibrium conditions for the 
total number of organisms pa, and for the total number 
of viable organisms py , 



Ap A ~ ap v - Pa = 0, 
Ap v - vp v - pv = 0, 



(12) 
(13) 



where a is the average number of daughters (viable and 
sterile) a viable organism has, and v is the average num- 
ber of viable daughters a viable organism has. Introduc- 
ing to — the average number of true daughters (daugh- 
ters which share the parent's genotype) for a viable 
organism — we see that 



F + Rn f . D s 

v = — m = {r + R n )a. 

r 



(14) 



From Eqs. ( 12)-([14|), we obtain steady state solutions for 
a and to, 



F- 1 

' F 



l + 5a' 

' F 



(15) 
(16) 



Using the branching process model, we can predict the 
abundance curve from the values of a and to (or con- 
versely, F and Rn). Fig. [| shows abundance data for two 
neutral model runs with differing values of R n (and con- 
sequently to), along with predicted distributions (which 
use only R n and F as parameters) based on the branch- 
ing model. Although the distribution patterns are very 
different, both are fit extremely well by the branching 
process model's predicted curves. In Eq. ([i"6|), note that 
R n is the rate of influx of new genotypes (and therefore 
new competitors for space) , while F is the rate of growth 
of existing genotypes. The value of to is determined by 
the ratio of these two rates. Unless the total number 
of creatures is increasing, to < 1 (m = 1 if and only if 
R n — > and new competing genotypes are introduced at 
a vanishing rate). 
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FIG. 4. Abundance distributions and predicted curves for 
two neutral model runs. The run shown by circles (~ 1.5 
million data points) had a grid size of 3000 x 3000, F — 0.5, 
and R n — 0.5, while the one represented by crosses (~ 0.6 
million data points) had a grid size of 100 x 100, F = 0.2, and 
Rn = 0.1. 



B. Artificial Life 

Our next system is the artificial life system sanda ||, 
an example of environments which host digital organ- 
isms JT~0) ] . In this system, while the organisms occupy 
a two-dimensional grid as in the neutral model detailed 
above, the organisms are no longer simple, and instead 
each has a complex genotype consisting of a string of as- 
sembly language-like instructions (Fig. Each organ- 
ism independently executes the instructions of its geno- 
type, and this genotype determines the organism's repli- 
cation time r. Unlike the neutral model, the system al- 
lows non-neutral mutations which lead to new genotypes 
with both lower and higher replication times than the 
parent. 

The system and the instructions are designed so that 
the organisms can self-replicate by executing certain se- 
quences of instructions. The replication time of an or- 
ganism is not a predetermined constant, rather it is de- 
termined by the genotype of the organism: Organisms 
can replicate faster or slower than other competing or- 
ganisms with different genotypes. For an organism to 
successfully replicate, its genotype must contain infor- 
mation which allows the organism to allocate temporary 
space (memory) for its daughter, replicate its genotype 
(one instruction at a time) into this temporary space, 
and then to divide, placing its daughter in a grid site of 
its own (Fig. ||). As in the neutral model, on division, 
the daughter replaces the oldest organism in its parent's 
9-site neighborhood. 

Organisms, depending on their genotype, may not be 
able to replicate (may be sterile) or may only be able 
to replicate imperfectly (resulting in no true daughters). 
Also, the copy instruction, which the organisms must use 
to copy instructions from their own code into that of their 
nascent daughters, has a probability of failure {copy mu- 



tation rate), which can be set by the experimenter. When 
the copy instruction fails, an instruction is randomly cho- 
sen from all the instructions available to the organisms 
(the instruction set) and written in the string location 
copied to. Copy mutations also lead to non-true daugh- 
ters. The instruction set is robust; copy errors (muta- 
tions) induced during the replication of viable organisms 
have a non-vanishing probability of creating viable new 
organisms and genotypes. Indeed, by selecting for cer- 
tain traits (such as the ability to perform binary logical 
operations) by increasing the relative speed at which in- 
structions are executed in organisms which carry these 
traits, the system can be forced to evolve and find novel 
genotypes which contain more information (and less en- 
tropy) than their ancestors. Even without this external 
selection, the system evolves organisms (and genotypes) 
which replicate more efficiently in less executed instruc- 
tions. 

As a result of this evolution, the fidelity and neutral 
mutation rate are not fixed, but can vary with the length 
of an organism's genome and the instructions contained 
therein. Also, new genotypes formed by beneficial muta- 
tions that allow faster replication than previously exist- 
ing genotypes will have (on average) an increasing num- 
ber of organisms — m > 1 — until the new, faster replicat- 
ing genotypes fill up a sizable portion of the grid. All 
these factors combine to make predicting the abundance 
distributions for sanda much harder than for the neutral 
model. 
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FIG. 5. Example sanda genotype. Sanda organisms have 
genotypes which are strings of sanda code. The string shown 
above replicates by: searching forward (instruction 1) for the 
complement of the template nop-A nop-A (2-3), which is nop-B 
nop-B (21-22), manipulating this value in an internal register 
to find the genotype length (4-5), allocating enough mem- 
ory to store code of genotype length (6), setting registers to 
prepare for copying (7-11), copying the instructions one at a 
time (12-19) until all instructions have been copied (15-16), 
and replicating (20) — placing the daughter in its own grid site. 
Execution restarts at the beginning of the genotype when the 
end of the genotype is reached, and continues until the organ- 
ism is replaced by the newly replicated daughter of another 
organism (or its own daughter). The copy command (14 in 
this particular genotype) fails and writes a random instruc- 
tion with probability 7. 
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Indeed, rather than being constant during the course 
of a sanda experiment, R n and F will vary unpredictably 
as the population of organisms occupies different areas in 
genotypic phase space. Certain genotypes may be brit- 
tle, allowing very few mutations that result in new viable 
genotypes. The length of the organisms may change, 
changing both the genomic mutation rate and the neu- 
trality rate. Genotypes exist which make systematic er- 
rors when copying, which decreases the fidelity. In short, 
the dynamics of these digital organisms are complex and 
messy, much like those of their biochemical brethren. 
These variations are observed at the same time across 
different organisms in the population, and are also ob- 
served with the progression of time. Still, we attempt 
to predict the abundance distributions by approximating 
the ratio of neutral mutations to true copies by the ob- 
served ratio of viable genotypes to total number of viable 
organisms ever created: 



~~F 



Kg 



(17) 



where N g is the total number of viable genotypes ob- 
served during a sanda run and N v is the total number 
of viable organisms. This relation should hold approx- 
imately under equilibrium conditions. Then, Eq. ( |l6| ) 
becomes 



^(i + f)-\ 



and from Eq. ( |l5| ) 



m 
~F' 



(18) 



(19) 



The fidelity F is inferred from the average length / of 
genotypes during a run and the (externally enforced) per- 
instruction copy mutation rate 7, F — (1 — 7)'. Because 
we estimate m and a from macroscopic observables aver- 
aged over the length of a run, we expect some error in our 
results due to the shifting dynamics of the evolution of 
genotypes as the system moves in genotypic phase space. 
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FIG. 6. Abundance data from two sanda runs with pre- 
dicted abundance curves. Both runs were started with 
the same initial genotype for all organisms, the same 
per- instruction copy mutation rate (7), and the same grid 
size (100 x 100). Run 192's genotypes evolved into a regime of 
genotypic phase space with longer average length, and there- 
fore lower fidelity F and higher neutrality R n , than Run 132, 
resulting in the differences in the abundance distributions. 
The predicted curves were generated by approximating a rep- 
resentative value of R„/F from the ratio of the number of 
viable genotypes to the number of viable organisms observed 
over the run. The data was binned using the template thresh- 
old method with T = 1 (see Appendix). 

The abundance data from two different sanda runs 
are shown in Fig. ^ with the predicted abundance 
curves. The two runs shared the same grid size and per- 
instruction copy mutation rate, and were started with the 
same initial genotypes, but the runs evolved into different 
regions of genotypic phase space and consequently had 
significantly different statistics. Considering the many 
gross approximations we have made, the agreement be- 
tween our prediction and the experimental data is sur- 
prisingly good (especially as no fitting is involved). Sanda 
is most closely related to an asexually replicating bio- 
logical population, such as colonies of certain types of 
bacteria occupying a single niche. The genotype abun- 
dance distributions measured in sanda are analogous to 
the species or subspecies abundance distributions of its 
biological counterparts. In general, species abundance 
distributions are complicated by the effects of sexual re- 
production, and of the localized and variable influences 
of other species and the environment on species abun- 
dances. However, we believe the branching model — used 
judiciously — can be helpful in the study of such distribu- 
tions as well. 



C. Evolution 

Rank-abundance distributions at taxonomic levels 
higher than species (e.g., the distribution of the number 
of families per order) are simpler to model than species 
abundance distributions, as the effects of the compli- 
cations noted above are weak or nonexistent. We find 
that the available data is well fit by assuming no di- 
rect interaction or fitness difference between taxa [ pT| . 
The shapes of rank-frequency distributions of taxonomic 
and evolutionary assemblages found in nature are surpris- 
ingly uniform. Indeed, Burlando has speculated that all 
higher-order taxonomic rank-frequency distributions fol- 
low power laws stemming from underlying fractal dynam- 
ics Q. We believe this conclusion is hasty: The diver- 
gence of the distributions from power law can be observed 
by applying appropriate binning methods to the data. 
(See Appendix.) Yule 1 12 attempted a branching process 
model explanation of these distributions, and claimed 
that divergence from power law of rank-abundance pat- 
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terns was transient and indicated a finite time since the 
creation of the evolutionary assemblage. Our model in- 
dicates that this is not generally the case. We find that 
the divergence from power law is not a result of disequi- 
libration, but is an inherent property of the evolutionary 
assemblage under consideration and that this divergence 
provides insight into microscopic properties of the assem- 
blage (e.g., the rate of innovation). 

Say, for example, that we are interested in the rank- 
frequency distribution of the number of families in each 
order for fossil marine animal orders. We assume that 
all new families and orders in this assemblage originate 
from mutations in extant families. Then, we can define 
rates of successful mutation Rf for mutations which cre- 
ate new families in the same order as the original family, 
and R Q for mutations which create an entirely new order. 
In this case, unlike the cases treated above, we approxi- 
mate a — ► oo; many individual births and mutations oc- 
cur, but the proportion that are family- or order-forming 
is minuscule. Finally, assuming a quasi-steady state (the 
total numbers of orders and families vary slowly 13 1), we 
rewrite Eq. (plf), 



(1 



No 
N f 



(20) 
(21) 



in terms of N a and Nf, the total numbers of orders and 
families respectively. As in the previous systems studied, 
R is the rate of creation of new — competing — orders, 
while Rf is the rate of growth of existing orders, and m 
is determined by their ratio. 




FIG. 7. The rank- frequency distribution of fossil marine 
animal orders (squares) Q and the predicted abundance 
curve (line). The predicted curve was generated — with no 
free parameters — by approximating R„/F by N /Nf = 0.115. 
The empirical distribution agrees with the predicted curve 
with significance 0.12 using the Kolmogorov-Smirnov test |nj . 
The fossil data is shown binned using the template threshold 
binning method explained in the Appendix with T = 1. 

Data for the abundance distribution of number of fami- 



lies in fossil marine animal orders |l4| are shown in Fig. [?]. 
We obtained values for N a and Nf directly from the fossil 
data to generate the predicted curve with no free parame- 
ters. The agreement is very good, much better than that 
for the sanda runs where evolutionary parameters such 
as the fidelity F and the neutrality R n were constantly 
changing. Comparing m and the resultant abundance 
curves with those obtained above for the rank- abundance 
distribution of sanda genotypes leads us to the expected 
conclusion that the probability of creation of a new geno- 
type in sanda per birth is much higher than the probabil- 
ity of a new family creating an order in natural evolution. 
Indeed, a wide variety of higher-order taxonomic assem- 
blages have abundance distributions consistent with m 
near 1 J| . We believe this is a robust result of the evo- 
lutionary process. Low values of m may not be observed 
for large taxon assemblages for several reasons. A small 
value of m implies either a small number of individuals 
in the assemblage, or a very specialized niche with a very 
low rate of taxon formation. A low number of individ- 
uals would lead to a low probability of the taxon being 
discovered and cataloged by biologists. A small number 
of individuals and taxa would result in an assemblage 
with too few taxa to give us a clear statistical picture. 
Also, since such an assemblage would have a small pop- 
ulation, be incapable of further adaptation, or both, we 
expect it would be more susceptible to competition and 
environmental effects leading to early extinction. 



D. Sandpile Models 



It was originally suggested that the self-organization 
observed in the sandpile model of Bak, Tang and Wicsen- 
feld (BTW) H (and the power laws it displayed) was an 
inherent property of the system, while it now seems es- 
tablished that the system is actually tuned by waiting un- 
til avalanches are over before dropping new grains — this 
is equivalent to allowing non-local interactions |]l5| , |l6| . 
The same conclusion is reached when using a branch- 
ing process to describe the avalanche dynamics. Branch- 
ing processes have been applied to sandpile models as 
early as 1988 @ (see also, ]l9|-||,0). Using a mean- 
field approach in higher dimensions (d > 4), power law 
distributions for the size of avalanches s(n) can be ob- 
tained analytically, and critical exponents can be calcu- 
lated exactly to reveal s(n) ~ 7i~ 3 / 2 |l7]] in the limit of in- 
finitesimally small driving. This is supported by numer- 
ical simulations. However, for lower dimensions, sand- 
piles will "interfere" with themselves, and a smaller ex- 
ponent is found. Attempts to calculate the effects of this 
"final-state" interaction through renormalization have as 
yet not been completely successful [ p3[ . Still, the phe- 
nomenon of "violations" of power-law behavior due to 
to < 1 (non-critical branching process) can be seen there 
as well. 
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IV. DISCUSSION 

The Galton- Watson branching process generates power 
law distributions when its control parameter m — 1. In 
all the systems we have examined above, 

m=(l + ^)- 1 (22) 
K p 

is determined by the ratio of the rate of introduction of 
competitors R c to the intrinsic rate of growth of existing 
assemblages R p . As this ratio goes to 0, m — > 1 and the 
system becomes critical. 

This relation can be translated into the standard rela- 
tion between an order parameter 

« = £ (23) 

and a new form for the control parameter 

fj, = m~ 1 . (24) 
Writing a in terms of fx, 

n/= \ (m-aO' 3 Ut > Mc), 
I o 0*</i c ), 

where \i c = 1 and (3=1 (Fig. ^|). The order param- 
eter represents the rate at which competition is intro- 
duced in the system (the strength of selection) . A value 
of the control parameter fi < fi c implies a system with no 
competition and no selection — an exponentially growing 
population. Values of fi higher than /i c indicate that new 
competition is always being introduced and that all exist- 
ing species or avalanches must eventually die out. When 
fi = [i c , competition is introduced at a vanishingly small 
rate, and we find the critical situation where separation 
of scales occurs. 



For sandpile models, this a is arbitrarily set close to 
by using large lattice sizes (reducing dissipation) and 
waiting for avalanches to finish before introducing new 
perturbations (resulting in an infinitesimal driving rate 
and a diverging diffusion coefficient). For the biologi- 
cal and biologically-inspired systems we have considered, 
the control parameter is not set arbitrarily to a critical 
value. However, the dynamics of the evolutionary pro- 
cess, in which it is much harder to effect large jumps in 
fitness and function than it is to effect small ones, lead 
to naturally observed values of a being small, especially 
for higher taxonomic orders. The dynamics of evolution 
act, robustly, to keep /i near fi c . This in turn leads to a 
near power law pattern for rank-frequency distributions. 

We have shown that the apparent power laws of 
avalanches in species-abundance distributions in arti- 
ficial life systems, as well as rank-abundance distri- 
butions in taxonomy can be explained by modeling 
the dynamics of the underlying system with a sim- 
ple branching process. This branching process model 
successfully predicts, with no free parameters, the ob- 
served abundance distributions — including their diver- 
gence from power law. 

A branching process approach may allow the deduc- 
tion of the microscopic parameters of the system directly 
from the macroscopic abundance distribution. We find 
that we can identify a control parameter — the average 
number of new events an event directly spawns, and an 
order parameter — the rate of introduction of compet- 
ing events into the system, and that these are related in 
a form familiar from second order phase transitions in 
statistical physics. 
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power law distributions arise. 
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APPENDIX A: BINNING METHODS 



are added to this amount until the sum of these values is 
greater than the threshold value, 




E > T - 



(Al) 



FIG. 9. Binned avalanche size distribution for the BTW 
sandpile in the limit of infmitesimally slow driving (the stan- 
dard BTW protocol). The inset shows avalanche size distri- 
bution data after 100,000 avalanches. The main panel shows 
the same data binned using the data threshold method with 
T = 1000. That this binning method accurately reproduces 
the function this data is drawn from can be seen by compar- 
ing to the data set of 16 million avalanches (Fig. hoh, which 
shows no discernible differences between the predictions made 
by binning and the conclusions given by more data. 

When dealing with event distributions best plotted 
on single log or double log scales (such as exponential 
and power law distributions), care must be taken in the 
proper binning of the experimental data. Say we are in- 
terested in the probability distribution P(n) of an event 
distribution over positive integer values of n. We con- 
duct N trials, resulting in a data set Q(n) of number of 
events observed for every n value. For ranges of n where 
the expected or observed number of events for each n 
is much higher than 1, normally no binning is required. 
However, for ranges of n where Q(n) or P(n) is small, 
binning is necessary to produce both statistically signif- 
icant data points, and intuitively correct graphical rep- 
resentations. A constant bin size has several drawbacks: 
One must guess and choose an intermediate bin size to 
serve across a broad range of parameter space, and the 
shape and slopes of the curve (even in a double log plot) 
are distorted |10|. These disadvantages can be overcome 
by using a variable bin size. However, choosing bin sizes 
for variable binning is time-consuming and arbitrary — 
different choices will lead to different conclusions. We 
propose two related methods of systematically determin- 
ing appropriate variable bin sizes. Both methods lead 
to binned data which help in visualizing the underlying 
distribution (slopes and shapes are conserved). 

For the first method (the Data Threshold Method), we 
start by selecting a threshold value T. Starting from 
n = 1 and proceeding to higher values, no binning is 
done until a value of n is found for which Q(n) < T. 
When such a value n s is found, subsequent Q(n) values 



We then have a bin size (ni — n s + 1), with value 
Q{ n )- When plotting, it is convenient to plot 
this as a single point at the midpoint of [n s ,nz], with 
an averaged value, 



ni — n 



(A2) 



This yields a graphical representation with little distor- 
tion and good predictive power (Fig. ||). This binning 
procedure is continued until no more data remains to be 
binned. 




FIG. 10. Avalanche size distribution in the 2-d BTW 
sandpile model with infinitesimal driving rate (16 million 
avalanches) . 

The second binning method (the Template Threshold 
Method), uses a predicted probability distribution P(n), 
or a reasonable surrogate. Again, we define a threshold 
value for fitting T. However, in this case, the bin sizes 
are determined by comparing values of the expected dis- 
tribution 



E{n) = P(n) x N 



(A3) 



to T. Starting from n = 1 and proceeding to higher val- 
ues, no binning is done until a value of n is found for 
which E(n) < T. When such a value n s is found, sub- 
sequent E(n) values are added to this amount until the 
sum of these values is greater than the threshold value, 



E E ( n ) > T - 



(A4) 



We then have a bin of [n s ,ri;] with corresponding size 
(ni — n s + 1). The average value associated with this bin 
is 
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(A5) 



m - n s + 1 

This procedure is repeated until the data is exhausted. 
For this method, the data may be graphically represented 
either as a single point per bin (as in the data threshold 
method above), or as a point (showing the associated 
average value) for each measured (non-zero) data point 
Q(n). 

The data threshold method requires no a priori knowl- 
edge, and is a good predictor of the underlying distri- 
bution. However, when there are few data points, the 
template threshold method is more reliable. For both 
methods, a range of T should be tried and the best T 
(neither over- or under-binning) chosen. 
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